Stream fusion for multi-stream automatic speech recognition
نویسندگان
چکیده
Multi-stream automatic speech recognition (MSASR) has been confirmed to boost the recognition performance in noisy conditions. In this system, the generation and the fusion of the streams are the essential parts and need to be designed in such a way to reduce the effect of noise on the final decision. This paper shows how to improve the performance of the MS-ASR by targeting two questions; (1) How many streams are to be combined, and (2) how to combine them. First, we propose a novel approach based on stream reliability to select the number of streams to be fused. Second, a fusion method based on Parallel Hidden Markov Models is introduced. Applying the method on two datasets (TIMIT and RATS) with different noises, we show an improvement of MS-ASR.
منابع مشابه
DBN based multi-stream models for speech
We propose dynamic Bayesian network (DBN) based synchronous and asynchronous multi-stream models for noise-robust automatic speech recognition. In these models, multiple noise-robust features are combined into a single DBN to obtain better performance than any single feature system alone. Results on the Aurora 2.0 noisy speech task show significant improvements of our synchronous model over bot...
متن کاملUsing the Multi Stream Approach for Continuous Audio Visual Speech Recognition Experiments on the M Vts Database
The Multi Stream automatic speech recognition approach was investigated in this work as a framework for Au dio Visual data fusion and speech recognition This method presents many potential advantages for such a task It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams First the Multi Stream f...
متن کاملEnsemble Feature Selection for Multi-Stream Automatic Speech Recognition
Ensemble Feature Selection for Multi-Stream Automatic Speech Recognition
متن کاملRecognition using speech synthesis : a reactive dynamic for robust ASR
Automatic Speech Recognition (ASR) systems are not efficient under noisy speech. In the Multi-Stream (MS) approach, commonly used to reinforce ASR robustness, each stream feeds one recognizer generating estimates which are combined through a fusion process. As some streams are optimal for transmission of some phonemes [1,3], it is then interesting to over weight the best stream during the featu...
متن کاملAdaptive Audio-visual Speech Recognition in the Presence of Audio and Video Distortions
Audio-visual speech recognition leads to significant improvements compared to pure audio recognition especially when the audio signal is corrupted by noise. In this article we investigate the consequences of additional degradations in the video signal on the audio-visual recognition process.. We degrade the images with noise, a JPEG compression, and errors in the localization of the mouth regio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Speech Technology
دوره 19 شماره
صفحات -
تاریخ انتشار 2016